Introduction to Recurrent Neural Networks
In this article we are going to explore one the most used and interesting neural network that is used in numerous tasks, including forecasting and stock markets.
Table of Contents
What is a Recurrent Neural Networks?
RNN is a special type of artificial neural network (ANN) used for time-series or sequential data. Feedforward neural networks are used when data points are independent of each other. In the case of sequential data points, they are dependent on each other. In that case, you need to modify the neural networks to incorporate dependencies between data points. RNNs have the concept of memory, which helps them store states or information of previous inputs to generate the next sequence of output.
It saves the output of a particular layer and feeds this back to the input to predict the output of the layer. As the above image shows, you can convert a normal feedforward neural network to RNN. The nodes in the different layers of the neural network are compressed to form a single layer. In the image below, A, B, and C are the parameters of the network.
Here, x represents the input layer, h denotes hidden layer, and y is for the output layer. A, B, and C are the network parameters that will be used for improving the output of the model. At any given timestep (t), the current input will be the combination of input at x(t) and x(t-1). The output is fetched back to the network to improve the output through backpropagation
Become a Certified AI Engineer.
Gain AI Expertise with Our Industry-Leading Certification
Why Recurrent Neural Networks?
This is an important question that needs to be answered to better understand RNNs. Every invention, upgrade, or update offers effective solutions to existing problems. RNNs were created to solve several issues of feedforward neural networks such as:
- Feedforward neural networks not being able to handle sequential data.
- Feedforward neural networks only consider the current input.
- Feedforward neural networks not being able to memorize previous inputs.
The single best solution to these problems is RNNs. They can handle sequential data and accept current input data and previously received inputs. The memory of RNNs can memorize inputs due to their memory.
Types of Recurrent Neural Networks
There are different types of RNNs with varying architectures. They are:
One-to-one
It is called plain neural networks. It deals with a fixed size of the input to the fixed size of output, where both of them are independent of previous information or output. The best example to describe this type of RNN is image classification.
One-to-many
It deals with a fixed size of information as input, which gives a sequence of data as output. A fitting example would be image captioning, which takes in the image as input and gives a sequence of words as output.
Many-to-one
It takes in a sequence of information as input and gives a fixed-size output. For example, It is used in sentiment analysis where a sentence is classified as expressing positive or negative sentiment.
Many-to-many
This type of RNN takes in a sequence of information as input and processes output recurrently as a sequence of data. It is applied in machine translation wherein RNNs read sentences in a language and give output in other languages.
How Do Recurrent Neural Networks Work?
In RNNs, the information cycles through the loop to the middle hidden layer.
The input layer, x, takes in the input to the neural network and processes and passes it into the middle layer. The middle layer, h, can consist of multiple hidden layers, each with its activation functions, weights, and biases. If you have a neural network where the various parameters of different hidden layers are not affected by the previous layer, i.e., neural networks remain unaffected since they do not have memory, then you can use RNNs.
The RNNs will standardize the different activation functions, weights, and biases so that each hidden layer has the same parameters. So, instead of creating multiple hidden layers, it will just create one loop over it as many times as required.
Purse a Career in AI Engineering.
Unlock Your AI Potential with Our Certification Program
RNN Architecture
Bidirectional Recurrent Neural Networks (BRNNs)
While unidirectional RNNs can only draw from previous inputs to make predictions about the current state, BRNNs can pull in future data to improve their accuracy. For example, if you take a phrase in which the last word is known, then predicting the phrase will become much easier after the first word is also known.
Long short-term memory (LSTM)
It is a popular artificial recurrent neural network used in the field of deep learning. LSTM has feedback connections, which are not present in the feedforward neural networks. LSTM can process not just single data points, but also entire data sequences. LSTM applies to tasks such as connected handwriting recognition, speech recognition, network traffic anomaly detection, etc. A common LSTM unit is composed of a cell, input gate, output gate, and forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information in and out of the cell. LSTM is well-suited to processing, classifying, and making predictions based on time-series data since there can be lags of unknown duration between important events in a time series. LSTM architecture is extensively used in solving vanishing gradient problems while training traditional RNNs. LSTM is the best possible solution today for solving sequence-to-sequence and time-series related problems. The only disadvantage of LSTM is the time taken to train a model. A lot of system resources and time goes into training a simple model. It is a hardware constraint, which can be easily solved once the hardware becomes more efficient.
Gated Recurrent Units (GRUs)
This architecture is also similar to LSTM. This is because GRUs also work to address the short-term memory problem of RNN models. GRUs use hidden states, instead of cell states, and two gates, in place of three gates. The two gates that are used here are the reset gate and the update gate. Very similar to LSTM, the reset and update gate control the amount of information to retain and which information to retain.
Gradient Problem Solutions
LSTMs are a very efficient way to deal with gradient problems. Let us first discuss the long-term dependencies. Suppose you want to predict the last word in the text, “The clouds are in the _____”. The most obvious answer to this will be “sky”. You do not require any further context to predict that last word in the mentioned example.
Now consider this example, “I have been staying in Germany for the last 10 years. I can speak fluently _____”. To predict this last word, you need the context of Germany. Then the most suitable answer to this will be “German”. This gap between the relevant information and the point where it is needed may have become very large. LSTMs help you solve this problem.
Backpropagation Through Time
Backpropagation through time is when you apply a backpropagation algorithm to an RNN that has time-series data as its input. In RNNs, one input is fed into the network at a time, and a single output is obtained. In backpropagation, you will use current as well as previous inputs as input. It is called a timestamp, and a timestamp consists of many time-series data points entering RNNs simultaneously. Once the neural network has trained on a time set and given you output, the output will then be used to calculate and accumulate the errors. Finally, the network is rolled back up and weights are recalculated and updated while keeping the errors in mind.
Implementation of RNNs using Keras
Here is how you can implement RNN using Keras on the IMDB dataset:
Import the libraries
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
Load the dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
Defining the parameters
max_features = 10000 # Number of words to consider as features
maxlen = 500 # Maximum sequence length
batch_size = 32
embedding_dim = 32 # Dimension of word embeddings
Padding the data, to ensure the consistency
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
Creating a simple RNN model
def create_simple_rnn_model():
model = keras.Sequential()
model.add(layers.Embedding(max_features, embedding_dim, input_length=maxlen)) # Embedding layer
model.add(layers.SimpleRNN(32))
model.add(layers.Dense(1, activation="sigmoid")) # Output layer (binary classification)
return model
model = create_simple_rnn_model()
Compiling the model
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
Check for the model’s architecture using summary
model.summary()
Training the model
history = model.fit(x_train, y_train, epochs=5, batch_size=batch_size, validation_split=0.2)
Evaluating the model
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")
Visualizing the results
import matplotlib.pyplot as plt
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
Get 100% Hike!
Master Most in Demand Skills Now!
Applications of Recurrent Neural Networks
RNNs have a wide range of applications such as:
- It helps in solving time-series problems such as stock market predictions.
- It helps in solving text mining and sentiment analysis problems.
- RNNs are heavily used in developing NLP technology, machine translation, speech recognition, language modeling, etc.
- It helps in image captioning, video tagging, text summarization, image recognition, facial recognition, and other OCR applications.
Conclusion
The traditional feedforward algorithms cannot solve time-series and data sequence problems, while RNNs can solve such problems efficiently. This tutorial has helped you learn RNN in detail, understood its types, the need for RNN, its architecture, how it’s used to solve gradient problems, and finally got to know about its applications. If you want to learn more AI, here is the perfect Artificial Intelligence Course that will help you out.
Our Artificial Intelligence Courses Duration and Fees
Cohort starts on 8th Feb 2025
₹79,002
Cohort starts on 25th Jan 2025
₹79,002
Cohort starts on 15th Feb 2025
₹79,002